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' Abstract 

o : 

We propose a new method for the Maximum Likelihood Estimator (MLE) of 
, nonlinear mixed effects models when the variance matrix of Gaussian random ef- 

| fects has a prescribed pattern of zeros (PPZ). The method consists of coupling the 

recently developed Iterative Conditional Fitting (ICF) algorithm with the Expecta- 
. tion Maximization (EM) algorithm. It provides positive definite estimates for any 

sample size, and does not rely on any structural assumption concerning the PPZ. 
■ It can be easily adapted to many versions of EM. 

Keywords: nonlinear mixed effects models; maximum likelihood; expected maximisation algorithm; 
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1 Introduction 

p ; 

Nonlinear mixed effects models are widely used in population Pharmacology for Phar- 
macokinetics & Pharmacodynamics (PK/PD) modelling. Such models can be seen as 
special cases of repeated measurement data, for which the asymptotics concern the num- 
>• . ber of individuals rather than the number of measures per individual, see for instance [8] 
and references therein. In this article, we show that for nonlinear mixed effects models 
with Gaussian random effects, Expectation Maximization (EM) like algorithms for the 
computation of the Maximum Likelihood Estimator (MLE) can be coupled with Iterative 
Conditional Fitting (ICF) like algorithms in order to take into account a prescribed pat- 
tern of zeros (PPZ) in the variance matrix of the random effect. The ICF algorithm has 
been developed very recently pE] in the context of directly observed Gaussian graphical 
models. Finding an adequate approach for generic PPZ in the context of nonlinear mixed 
effects models is a long standing problem. Our approach provides a true solution for the 
M step of EM in this context, for any PPZ. It is thus far more satisfactory than the 
standard approaches used in the existing software packages such as NLMIXED (SAS), 
NONMEM, nlme (S-Plus and GNU-R), or Monolix. 

For instance, a traditional model used in population PK/PD is of the form 

Yi = FiX,) + g(Xi, 9)ei, l<t<N, (1) 
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where Yi is the vector of concentrations/effects observed on the i th individual for the drug 
of interest. Here F is a known function, often nonlinear. The q- vectors Xi represent 
the unobserved individual parameters assumed independent and identically distributed 
J\f (m, E), the Si are M(0,I) , unobserved, independent of the Xi and independent. The 
matrix g(Xi, 9) is the Cholesky transform of a positive definite matrix that depends on a 
parameter 9 G C W. 

For instance, Figured] represents the maximum concentrations of Cortisol (Yi) obtained 
after giving fixed doses of ACTH to iV = 30 horses. Each individual has its own curve 
described by the parameter X^. The g(Xi, 9) matrix defines the variance heterogeneity of 
concentrations obtained with different doses. This matrix is often assumed to be equal to 
#diag(|.F(Xj|). The reader will later find a study of a model like (CQ) for this Cortisol data 
set. 

One of the main goals in population PK/PD is to describe the distribution of the X^s 
by observation of the Y^s. This amounts to the estimation of (m, E,#) from the Y^'s. 
Recall that the X^s are not observed. A natural approach is to compute the MLE by 
maximizing 

N 

L{m, E, 0) = J] J ^ -y^ (g-\x t , 9){Y t - F{ Xi ))) <f> m A x i) dx i ( 2 ) 

where 0m,s(-) is the probability density function of the Af (m, E) distribution and | g(xi, 9) \ 
denotes the determinant of the matrix \g(xi,9)\. Except for specific models, such as 
Gaussian linear mixed effects models, maximum likelihood estimators have no closed 
form. Several methods have been proposed for estimation of the parameter (m, E, 9) in 
these models. The methods suggested by Beal and Sheiner [2] or Lindstrom and Bates 
[H] are based on a linearization of the conditional model (CQ) with respect to the vector Xi 
about or about a posterior mode. Pinheiro and Bates |15j . Vonesh and Carter [IB] , and 
Wolfinger POJ proposed Laplacian approximations of the likelihood. Importance sampling 
approximations [15J, Gaussian quadratures [5], and pseudo-likelihood methods [5] have 
also been investigated. The reader will find a detailed analysis of these methods in the 
book by Davidian and Giltinan [8] . More recently, stochastic versions of the EM algorithm 
have been proposed, see for instance [13] and [19]. These EM like algorithms converge to 
the MLE under some regularity and identifiability conditions. 

In many real situations, the kineticist's knowledge of the drug mechanism imposes a 
specific independence pattern on some components of Xi. This means that the variance 
matrix E contains a PPZ. The estimation of (m, E, 9) in the presence of a PPZ in E is 
problematic due to the positive definiteness constraint in the optimization. Pinheiro and 
Bates [T3] studied different parameterizations of E that ensure the definite positiveness of 
the estimate. In particular they suggested the usage of a Cholesky like parameterization. 
Unfortunately, except for the case where E is a block diagonal matrix up to coordinates 
permutation, Cholesky like parametrizations do not preserve the structure of the PPZ 
and are thus useless. Kuhn and Lavielle proposed estimating E in two steps in the 
implementation of their EM like algorithm. First, E is estimated without any constraint, 
then zeros are plugged according to the PPZ into the estimate provided by the first step. 
This method is widely used in practice. Unfortunately, by "forcing the zeros" in this way, 
nothing guarantees that the obtained estimate is still a positive matrix, and even when it 
is positive definite, it is not the maximum likelihood in general. 

For Gaussian linear mixed effects models, the algorithm of Anderson pQ deals with 
any linear hypothesis on the variance matrix of the random effect (a PPZ for instance). 
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Unfortunately, the estimate is not necessarily positive definite, see for instance [3J. To 
our knowledge, no method is available for MLE of nonlinear models such as (JTJ) when the 
variance matrix £ of the random effect has a PPZ. 

The aim of this article is to propose a general method for the estimation of (m, S, 6) 
in the presence of a PPZ in S. The method uses the ICF algorithm to perform the 
Maximization step of the EM algorithm. In other words, we couple EM and ICF in order 
to compute the MLE (or at least a stationary point of the likelihood) of (m, E, 6) when 
£ has a PPZ. 

The ICF algorithm was developed recently by Chaudhuri et al. in |?J to estimate 
a variance matrix with PPZ of observed Gaussian random variables. In contrast, the 
random effects Aj's in (OQ) are Gaussian but not observed, and that is why we couple 
ICF with EM. The ICF converges towards positive definite saddle-points or local maxima 
of the likelihood function irrespective of the PPZ. The idea behind ICF is not new in 
the framework of graphical models, and is inspired by the famous Iterative Proportional 
Fitting (IPF) algorithm. We refer to [I] for a review. Some alternative algorithms to ICF 
are available for specific PPZ, such as chain graph models [6j or non-chordal graph models 
[7j. The ICF algorithm is attractive because it does not rely on a specific structure of the 



The rest of the article is organized as follows. In section 2, we give some of the 
properties and drawbacks of the popular "zero forced" estimator, that consists of plugging 
zeros according to the PPZ into a full variance matrix. In section 3, we recall the main 
properties of the EM algorithm for models such as ([I]). Section 4 is devoted to the ICF 
algorithm, and to the coupling of ICF with EM. Section 5 contains the step by step analysis 
of a model like ([T]) for the Cortisol data set depicted in Figure HJ In the last section, we 
perform a simulation study that quantifies the benefit of our EM+ICF approach on the 
model used for the Cortisol data set. 

2 The Zero forced estimator 

Assume for example that for some specific model we get the following MLE for S: 



without taking into account the PPZ in E. We will refer to this as the unconstrained 
estimation. If the PPZ consists of £13 = £31 = 0, the "zero forced" estimation of £ is 
simply given by 



The unconstrained estimate £ uc is a positive definite matrix but the "zero forced" esti- 
mate Yj z f is not. However we know that for a regular model, the unconstrained MLE is 
consistent. Therefore £ uc converges componentwise towards the true matrix E with PPZ. 
Consequently, there exists a random sample size from which the "zero forced" estimator 
is a positive definite matrix but this sample size is somewhat difficult to obtain. 

A possible ploy allowing to build a positive definite consistent estimator of £ could be 
as follows. Compute the unconstrained estimator and denote it by £ z /, the corresponding 



PPZ. 
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"zero forced" estimator. Nothing guarantees that its lower eigenvalue \ m in is positive but 
since E 2 / is a consistent estimator of E, the quantity 

(Amm)- = max{-A min ,0} 

is a random sequence of positive numbers that converges almost-surely to zero. Now, 
consider some auxiliary sequence of positive real numbers (it at) that goes to zero with the 
sample size N (e.g. Un = 1/N 2 ), then, for any sample size N, the matrix 

)- + U N ) I 

is a positive definite consistent estimator of E, and features the same PPZ. Its main 
drawback is that its diagonal terms are biased and that the choice of the (it at) sequence is 
arbitrary. A better way to proceed is to directly consider the MLE of E with PPZ, which 
is precisely our aim in the next sections. 

3 The EM algorithm 

The EM algorithm [9] is a popular method to estimate parameters of a model with non- 
observed or incomplete data. Let us briefly recall how its general form works as introduced 
by Dempster et al. The EM algorithm consists of iterations of an Expectation and a 
Maximization step. At the k th iteration, the E step computes the conditional expectation 
of the log-likelihood of the complete data (Y, X) with respect to the distribution of the 
missing, or non-observed, data X given the observed data Y at the current estimated 
parameter value ip^: 

Q(V^ (fc) ) =E[\ogP(Y,X)\Y,^]. 
The M step finds ip( k+1 ^ so that for all ip in the parameter space \1/ 

^(k+l) = argsup Q (^«) . 

These two-step iterations are repeated until convergence. The essential property of the 
EM algorithm is that the likelihood increases monotonically along the iterations. Under 
some identifiability and regularity conditions, this algorithm converges to a stationary 
point of the likelihood, see for instance [22] . 

Let us now describe more precisely this algorithm for model ([T]). We need first to 
define the parameter space on which the M step is to be performed. In this model, the 
parameter to be estimated is ip = (m, E,#). The variance matrix E lives in a subset of 
the set Sy~ of q x q symmetric positive definite matrices. More precisely, let II be the set 
of subsets of {(i, j); 1 < i < j < q}. For any it ell, the set 

S+(n)±{AeS+Mh3)en,A 3 =0} 

is formed by the symmetric positive definite matrices that have zeros located in 7r. The 
PPZ in E is represented by an element tt of II. We thus assume that for some it G II, 
if) = (m, E,0) G ^ = M x 5+(7r) x 6 where M and 9 are open subsets of R q and IR P 
respectively. 
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At the k iteration the Expectation step consists of the computation of 
Q (m, E, 6\m k , E fc , 9 k ) = Ve flog * (^(JQ, 0)(y< - ^(X^F,, E fc , ro fc , 0» 

t \ \g{^i,v)\ 

where 

The maximization step computes 

(mfc+ijEfc+i.flfc+i) = arg sup Q (m, E, 9\m k , E fe , 9 k ) . 

MxS+(tt)xS 

For model ([T]), the integral that appears in the E step can be split into two parts . The 
E step reduces to calculate 

Q (m, E, e\m k , E k , k ) = Q x (m, E|m fc , E k , k ) + Q 2 {9\m k , E fc , 9 k ) 

where 

Qi (m,E\Yi,m k ,E k ,9 k ) = E (log <j> m> y(Xi)\Yi, m k , E fc ) 

i 

= -~^E((Xi- m)' E- 1 {Xi - m) \Y U m k , E fe ) - - log |E| 



-^(^E-(«-»)«-»)'k.^^) 



and 

Q 2 (0|m fc , E fc , 9 k ) 4 ^ E (log ((T^i, 0)(y 4 - F(X,))) - log(|^(X J; 6')|)|F i , m fc , E fc , fc ) . 

j 

It follows that the M step can also be decomposed into two parts 

sup Q (m, E, 9\m k , E k , 9 k ) = sup Q x (m,E|TO fc ,E fc , 9 k ) +supQ 2 (9\m k , T, k ,9 k ) 

MxS^(tt)xB MxS^(tt) 



e 



Note that in the M step the maximization with respect to (m, E) is separated from that 
of 9. The function Q 2 depends only on 9 via g. 

Remark 1. In most applications, h is the probability density function of a standard 
Gaussian distribution, and 9 is a variance matrix that can possibly contain a PPZ. In 
that case, its maximization can be performed using the ICF method, as for E, as described 
hereafter. 

Maximization of Q\ leads to 

m k+1 = — ^2 E (Xi\Yi, E fc , m k , 9 k ) (3) 
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and 



= arg inf tr [XT, 

SeS+(7r) V 



-1 



) 



+ log|£ 



(4) 



where 



1 



^ E {(Xi - m k+ i) (Xi - m k +i)' \Y h E k , m k , 9 k ) . 



(5) 



X 



N 



The matrix X is an empirical conditional variance matrix. The random vectors Xi are 
not observed, however, the matrix X can be evaluated at each iteration of the algorithm. 
When 7i = 0, that is when E has no PPZ, T, k +i reduces to X. When ix ^ the maximum 



4 Estimating the variance matrix with ICF 

We have seen in the previous section that the M step of the EM algorithm involves a 
maximization problem such as 



The difficulty here is that the optimization is not performed on the entire cone of sym- 
metric definite matrices but only on a sub-cone that contains the matrices with PPZ. It 
is clear that a standard gradient-like algorithm does not fit these constraints. The usual 
method to get rid of the definite-positiveness constraint is to use a Cholesky like decom- 
position. Unfortunately, these decompositions do not preserve the PPZ when £ is not a 
block diagonal matrix up to a permutation of the coordinates. The algorithm described 
hereafter allows one to move within S^(ir) whatever 7r may be. 

First, note that in the absence of PPZ in S, i.e. when tt = 0, the solution of (EI) 
is = X. We assume from now on that tt ^ 0. The case q = 2 is trivial since the 
only possible zero is S 12 = 0. In this case, the "zero forced" estimator is always positive 
definite and is the solution of ([6]). We assume in the sequel that q > 2 and that tt ^ 0. 

The method we propose is the core of the ICF algorithm presented in [J]. Even if 
Chaudhuri & al. did not express it at such, it is mainly based on the specific properties 
of the Schur complement of a matrix. Let us recall the following classical result, which 
can be found in [11 J or [21] for instance. 

Theorem 1 (Schur). Let I be an integer in {1, . . . ,q}. Consider two vectors U and V 
that respectively belong to ~R q ~ l and M 1 , the q-vector Z' = (U',V) and a matrix E G S| 
that admits the following block decomposition 



of Qi(T) must be sought in S£(n). The next section deals with this problem. 



= arg inf 

EeS+(7r) 




(6) 




where A G Sz_i, B is a (q — I) x I matrix and C G . 7 
the matrix S = C — A'~ l B. It belongs to 5", + and moreover 



The Schur complement of A is 



i) det(S) = det(A)det(5), 

ii) Z'T- l Z = U'A~ X U + {V - B'A- X U)' S- 1 (V - B'A~ X U) . 
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The Schur complement appears naturally when the random vector Z described in the 
previous Theorem is distributed according to jV (0, £). More precisely we have: 

C{V\U) =M{B'A- 1 U 1 S). 

Note that properties i) and ii) remain true after permutation of the rows of E. In the 
particular case where I = 1, we can easily derive the following property. 

Corollary 1. We keep the same notations as in the previous proposition. For any j G 
{l...q}, let A = E-j,-j be the submatrix of £ obtained by removing its j th row and 
column , B = Y-j.j the j th column vector of £ in which the j th row has been removed, 
C = T,jj. The column vector U and the positive real number V are respectively obtained by 
removing the j row of Z and as Zj. Then, using these notations, the Schur complement 



ofA = 
i) and ii) hold. 



-3,-3 



is the real positive number S given by the previous proposition and properties 



We are now able to solve the optimization problem (jlj) by running iteratively the 
decomposition of the Corollary over the columns of S. Set Tj = Xi — m k +i and note that 
from ([6]) the function that has to be minimized can be rewritten as 



K(Z,T U ...,T N ) = 1 J]E(7; / S- 1 T t |F J ,m fe ,S fc ) +log|S|. 



(7) 



For the j column of S we set A = _j, B = E-jj, 
so that we can now write (H) as 



C 



Ejj and S = C-B'A- 1 B 



K(Z, T 1; . . . , T N ) = K(A, [A,..., U N ) + K(S, V, - B'A^U,, ...,V N - B'A~ X U_ 



where Vi is the j th component of Tj and C/j is obtained by removing the j th component of 
Tj. Therefore, if A is fixed, the partial optimization of K(Tl,Ti, . . . ,T/v) with respect to 
(B, S) can be reduced to the global optimization of 



K(S, Vi — B'A^Ui , ...,V N -B , A- 1 U N ) = 

J- J^E ((^ - B'A^U,) 2 ft, m k , £ fc ) + \og(S) 



which is a standard least-squares problem. The optimization with respect to B and S 
leads to 



B, 



opt 



and 



5]E(Vrt|^m fc ,E t ) (8) 



.4 



J2^(U l U' l \Y t ,m k ,E k ) 



(9) 



(10) 



The vector B = j may contain some PPZ. These components are not optimized and 
are thus left at zero. This only decreases the dimension of the optimization problem. We 
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deduce that after this step on the j column of E, C — Ejj and B = E_j j must be 
respectively updated with 

E"J = S op t + B opt B op t and ^-jj = B op t. 

The striking property of this step is that it allows us to move within S£ without affecting 
the prescribed null components of E: A = ^-j.-j and the null components of B = E_jj 
are left unchanged. Since E is assumed positive definite C = Ej j cannot be zero. 

As already mentioned, iterations of these steps converge to a local maximum of 
A(E, Ti, . . . , Tjy), see for instance [I]. 



5 The Cortisol data set 



In this section, we use a practical example to illustrate the implementation of an EM 
algorithm coupled with the ICF algorithm. In order to explore the endocrine function of 
horse, a sample of horses (N = 30) was given eight doses of ACTH by intravenous route. 
The ACTH stimulates the adrenal gland that produces Cortisol. The concentration profiles 
of Cortisol in plasma were summarized by the maximal concentration reached after the 
ACTH administration (see Figured]). 

The seven doses of ACTH given to each animal were (in mg/kg) 



0.005, 0.01, 0.1, 
The production of Cortisol is modelled as 

X2idf 31 



0.5, 



10. 



Y 



A* 31 + df 31 



1 + ae 



1 < j < 7, 1 < i < 30, 



where is the maximal Cortisol concentration observed in the i th horse after adminis- 
tration of a dose dj of ACTH, X[ = (Xn, . . . ,X i4 ) is a random vector that contains the 
individual parameters for the i th animal. We assume that the random vectors Aj are inde- 
pendent and identically distributed Af (m, E) and that the residual terms e\ = (en, . . . , e^) 
are independent and identically distributed J\f (0,1-?). Moreover, the Aj's and £j's are as- 
sumed to be mutually independent. In this example, p = 4, 9 = a 2 and 



g(x, 



crdiag A x + 



x£ + < 3 



j=1...7 



According to the kineticist, the correlations between Xn and Aj 4 and X i3 and A i4 should 
be zero and thus E has the following structure 



/ 



. 
. / 



In some problems, no a priori information is available for the possible zero correlation 
between the components of Aj. The method of multiple testing of correlation, as described 
in Drton and Perlman [HI] , may be used in such cases to reveal the structure of E. 
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The estimation of model parameters requires evaluation of the conditional expecta- 
tions of functions such as E (f(X{, Fj)|Fj, rrik, erf) . A standard approach is to use a 
stochastic version of EM that consists of the simulation of a Markov Chain [Xf )i with 
P (.\Yi, Efe, rrik, of) as unique stationary distribution by using a Metropolis-Hastings algo- 
rithm and the approximation of the conditional expectation by 

1 ' 

E (/(X,, Yi)\Yi, E fe , m h , 4) « - £ /(xf , Fj). 

i=i 

For the analysis of the Cortisol data we chose L = 500. We simulated the Markov chain 
with the Metropolis-Hastings algorithm with M (rrik, E&) as the proposal distribution. In 
this case, the acceptance probability of the Metropolis-Hastings algorithm reduces to 

• ( <P{g-\X,al){Y - F(X)))\g(xt\al)\ \ 
\\g{X,al)\t{g-i{x«),al){Y - F{x«)))Y ) 

which only depends on the conditional distribution of the observation. 

The algorithm to estimate the model parameters can be summarized in the following 
scheme: 

1. Start from some initial guess So, rrio, <Jq and set k = ; 

2. Compute m fe+1 from (j3J) and 

a 2 k+1 = ^^E((F-F(X i ))'C- 1 (X l )(F-F(X J ))|F i ,m fc ,E fc ,^) , 

i 

where C~ 1 (X i ) is a diagonal matrix whose j th term is (l/(F(Xi))j) 2 . 

3. Set T,f = Sfc and and set I = 

4. for j:=l to q = 4 
increment I 

compute = 5^ + B' opt ^E^- ^J B opt and = B opt where S'opt and .Bopt 

are respectively defined by (ITUl) and (JED ; 

5. if E^ -4 ' 1 and E^ are not close enough go to step 4. Otherwise set S fe+1 = E^ ; 

6. Stop if (E fc ,TOfc,0f) and (E fc+1 , m fc+1 , are close enough. Otherwise increment 
k and go to step 2. 

For standard EM algorithms, i.e. when no constraint is imposed on the structure of E, 
steps 3), 4) and 5) of the previous algorithm should be replaced by the update of E^ 
according to equation 

It is well known that standard EM algorithms go quickly to a stationary point of the 
likelihood during the first iterations and then take time to converge. Since for stochas- 
tic EM algorithms the criterion being optimized changes randomly at each iteration, it 
is somewhat difficult to achieve and check convergence even when the length L of the 
simulated Markov Chain is large. Improvements have been proposed to overcome these 
problems. In particular Kuhn and Lavielle [13] suggested updates of the following form : 

Enetu = (1 — 7fc)^fc + 7fcEfc+l 

m new = (1 - 7fc)m fc + 7fcm fc+ i 
= ( 1 -7fcW + 7fc^+i, 
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where (jkjk is a decreasing sequence of positive numbers, (Ejfe+i, mk+i, Cfe+i) is defined as 
in the previous algorithm and (E ne „,, m new , cr 2 ew ) is the update of (E&, m^, o^). Note that 
this update scheme forces the algorithm to converge and preserves the PPZ as well as the 
positive defmiteness of E. 

The sequence (j^k should satisfy X]fc7fc = +°° an d Tlkll < +°°- These two con- 
ditions are fulfilled when 7^ = a/k b with a > and b G (0,1). Choosing 7^ = a/A; 
speeds-up convergence of the algorithm but the choice of a has to be made sample by 
sample. Choosing the same a for all samples can lead to poor estimations. As practical 
advice, we suggest choosing 7^ = l/k°' s . The algorithm takes more time to converge but 
a fine tuning of a is unnecessary. 

A well known drawback of the EM algorithm is that it does not produce standard errors 
as a by-product. We implemented the method proposed by Jamshidian and Jennrich 
[T2] . This method relies on numerical derivation and seems well-suited to the method we 
propose. Even if standard errors are helpful for comparing the results obtained with these 
two models, the Fisher information matrix gives pertinent quantitative information only 
when the sample size is large enough. However, N = 30 is probably not a large sample 
size. In the next section we use simulations to weight the performance of the estimation 
proposed for the Cortisol data. 

The estimation of the model parameters for the Cortisol data requires some initial 
estimates to be provided. Thanks to the model parametrization, we can directly read rea- 
sonable values for mo on Figure [U Since the four components of X respectively represent 
the basal value of Cortisol, the maximal increase, the "slope" of the sigmoid and the ACTH 
dose for which half the maximal increase is obtained we roughly get mo = (50, 70, 1, 0.1). 
We initialize E with the following diagonal matrix: E = Diag (0.01m 2 ,). Finally, for this 
heteroscedastic model, a can be interpreted as the coefficient of variation of the Cortisol 
for a given dose. We set it at 20% that is ctq = 0.2 2 . We estimated E with EM alone (no 
constraint was imposed) and with EM+ICF that preserves the PPZ. In this example, the 
algorithm seems to converge in less than 400 iterations. We implemented this algorithm 
in C++ with a matrix library. Estimates of the parameters obtained with EM alone were: 

/ 21.25(7.90) \ 

6.28 4.25(1.05) 

0.13 0.33 0.047(0.0071) 

\ 0.015 0.000867 -0.000267 0.0000213(0.000016) / 

The figures between brackets are the standard errors for the variances. We have chosen to 
give only some standard errors to lighten the presentation, rh = (50.03, 69.81, 1.78, 0.0845), 
se (m) = (0.87,1.84,0.085,0.0069) , a 2 = 0.0145 and logL(E,m,a 2 ) = -750.25. Esti- 
mates of the parameters obtained with the EM+ICF algorithm were: 

/ 19.50(7.85) \ 

-4.66 2.33(1.12) 

-0.29 -0.095 0.058(0.0063) 

\ -0.0024 0.0000144(0.000012) / 

rh = (48.84,71.46,1.47,0.0840), se (rh) = (0.82,1.81,0.084,0.0071) and a 2 = 0.0151 
and logL(E, fh, a 2 ) = —754.23. We can see that these likelihoods are about the same and 
a likelihood ratio test would not reject the PPZ proposed by the kineticist. The resid- 
ual variance estimates are also very close. Surprisingly, there are quite large differences 
between the estimates of the third component of m and the non null components of E. 
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Remark 2 (Modelling). The general problem of mean and variance modelling for longi- 
tudinal data is delicate, and several choices are possible, see for instance JE/, f21^ - [Tb\ [77| / ; 
f2by . and |5j/. Our model (fTTj) belongs to a standard family of models in PK/PD and was 
chosen with the kineticist. This relatively simple model is heteroscedastic with a constant 
coefficient of variation. An examination of the "individual" residuals shows that they are 
centered, which is quite satisfactory. 



6 Simulations 

The aim of this section is to quantify the potential benefit of directly estimating a variance 
matrix with PPZ. We simulated 100 data sets using model ( TTTT) with parameters close to 
the estimate found in the Cortisol data analysis : N = 30,m' = (50, 70, 1.5, 0.08), 

/ 20 \ 



a 2 = 0.015 and £ 



-4.5 2.5 
-0.3 -0.1 0.05 
\ -2 x HT 3 10~ 5 / 



Both EM and EM+ICF estimates were calculated. Results are given in Table [TJ 

As expected, the standard errors given in the example are smaller than those of Table 
[TJ For such sample sizes, which are often encountered in practice, asymptotic statistics 
should be interpreted with care. 

The mean parameter m seems to be well estimated. At least on these simulations, the 
S structure influences the estimation of m. However, we notice that the estimates obtained 
with EM+ICF have a smaller standard error and mean quadratic error (M.Q.E.) than 
those obtained without any constraint. On the whole EM+ICF also gives estimates with 
lower bias. This suggests that the mean and variance estimations are heavily dependent. 
This sheds light on approaches, such as the 'zero forced" method, that rely on estimating 
the full variance matrix first and modify it by forcing the PPZ: since all the non zero entries 
are estimated with the assumption that the variance matrix does not have prescribed 
zeros, they could be poorly estimated. This is consistent with the results obtained by Ye 
and Pan [23] who concluded, in a different context, that misspecification of the working 
variance structure could lead to a large loss in efficiency of the estimators of the mean 
parameters. 

Likelihood ratio tests were performed to test 

H : £gS+(tt) 
Hi : X G 

for 7r = {(1,4); (3,4)}. Note that whatever the value of it, H is not on the boundary 
of Consequently, the likelihood ratio statistics follow asymptotically a Chi-square 
distribution under Hq. Since the data have been simulated under Hq, the P- values dis- 
tribution should be close to a uniform law on (0, 1) at least for large N. The Q-Qplot 
of the P- Values is represented in Figure [2j This figure shows that the P- Values are not 
distributed according to a uniform distribution and thus the distribution of the likelihood 
ratio statistics is not close to a x 2 distribution. Consequently, N = 30 is probably not 
large enough to trust asymptotic statistics. 
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7 Conclusion 



We have proposed a method for the estimation of the variance matrix with PPZ in nonlin- 
ear mixed effects models. This method, which consists of coupling an ICF like algorithm 
with an EM like algorithm gives more efficient estimates than standard EM that ignore 
the PPZ. For the sake of simplicity, we have only presented the estimation algorithm for 
independent and identically distributed observations. Extension to different numbers of 
observations per individual is straightforward. We also restricted our study to models 
with Gaussian e. More general models in which the distribution of e is not Gaussian and 
depends on a parameter 62 can also be considered. This simply requires the Metropolis- 
Hastings chain to be chosen accordingly. We deliberately chose to show in section 2 a 
columnwise ICF implementation that can be extended using theorem 1 to blocks of S. 

Of course, our approach can be adapted without much effort to many versions of 
EM and many alternatives to ICF. For pedagogical reasons, we presented our EM+ICF 
coupling on a low dimensional example. The method of course is particularly suited to 
large variance matrices with a high percentage of prescribed zero entries. 
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EM 






EM+ICF 






True value 


Mean 


S.E. 


JM.Q.E. 


Mean 


S.E. 


JM.Q.E. 


En 

z — '11 


20 


17.3 


10.74 


11.07 


18.88 


9.09 


9.16 


^12 


-4.5 


-3.64 


3.16 


3.28 


-4.1 


2.39 


2.42 


^22 


2.5 


2.27 


1.48 


1.5 


2.74 


1.25 


1.28 


E 1? x 10 2 

lo w 


-30 


-8.04 


86.53 


89.27 


-12.27 


39.85 


43.62 


Eos x 10 2 


-10 


-7.57 


19.95 


20.1 


-10.66 


14.65 


14.66 


Esq X 10 3 


50 


74.36 


95.6 


98.66 


73.67 


65.71 


69.85 


E M x 10 4 


o 


-7 


91.62 


91.89 


o 


o 


o 


E 24 x 10 4 


-20 


49.78 


306.23 


314.08 


104.01 


204.02 


238.76 


E 34 x 10 5 





-43.69 


63.29 


76.9 











E 44 x 10 6 


10 


17.84 


14.15 


16.18 


18.57 


13.68 


16.15 


?7ll 


50 


51.18 


1.38 


1.82 


48.96 


1.23 


1.61 


777,2 


70 


70.73 


2.60 


2.70 


69.52 


2.44 


2.48 




1.5 


1.54 


0.22 


0.22 


1.49 


0.21 


0.22 


m 4 x 10 3 


80 


91.44 


4.08 


12.15 


90.06 


3.87 


10.78 


a 2 x 10 3 


15 


15.91 


1.77 


1.99 


14.07 


1.57 


1.82 


log like. 




-749.32 


11.39 




-751.54 


11.68 





Table 1: Empirical mean, standard error and square root of mean-quadratic-error of 
the estimates (M.Q.E.) obtained with EM and EM+ICF. The Mean Quadratic-Error is 
defined as bias 2 + Variance. 
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Figure 1: Maximum Cortisol concentrations observed after IV administrations of ACTH 
in 30 horses. 
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Figure 2: The Q-Q plot of the P-values of the likelihood ratio test versus the uniform 
distribution on (0, 1). 
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